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Abstract: There are a number of in silico programs that use algorithms and external web sources 
to predict the effect of single nucleotide polymorphisms (SNPs). While many of these programs 
have been shown to predict accurately the effect of SNPs in functional areas of the gene, such 
as 5' upstream or coding regions, empiric research may be warranted to confirm the functional 
consequences of SNPs that are predicted to have little to no effect. We compared predictions 
from FASTSNP (Function Analysis and Selection Tool for Single Nucleotide Polymorphism) 
and F-SNP (Functional Single Nucleotide Polymorphism) with experimentally derived geno- 
type-phenotype correlations to determine the accuracy of these programs in predicting SNP 
functionality. We used normal colon tissue to evaluate 24 TagSNPs within six genes. Two of 
1 6 SNPs that were predicted to have no functional effect in FASTSNP were significantly associ- 
ated with gene expression. Only one of the eight SNPs that were predicted to have a low to high 
effect was significantly associated with gene expression. While the two in silico programs that 
were used were similar in their results for the SNPs predicted by FASTSNP to have no effect, 
of SNPs with scores from low to high, there were three that received an F-SNP score below 
what is considered functionally significant. In silico programs can fail to identify functional 
SNPs, supporting a continuing role for empiric analysis of SNP function. Laboratory analysis 
is necessary to identify causal SNPs accurately, establish biological plausibility of the effect, 
and ultimately inform cancer prevention strategies. 
Keywords: in silico prediction, colon, single nucleotide polymorphisms 

Introduction 

The ability to link functional genetic variants with disease risk leads to advances 
in diagnostics and therapeutics. 1 Over 10 million single nucleotide polymorphisms 
(SNPs) have been reported 2 with an estimated 100,000-300,000 that alter an amino 
acid. 3 In silico prediction programs have been developed to identify SNPs with pos- 
sible functional effects. 4 Several programs are available, each with unique algorithms 
to assess the potential effect of an amino acid sequence substitution. 5 For instance, 
FASTSNP (Function Analysis and Selection Tool for Single Nucleotide Polymorphism) 
utilizes web wrapper agents to gather information from 1 1 different web servers to 
offer real-time information on phenotypic risk and functional effects, and F-SNP 
(Functional Single Nucleotide Polymorphism) uses 1 6 different tools and databases in 
an integrated fashion to predict functionality based on splicing, transcription, transla- 
tion, and post-translation. 4 6 

These programs are useful in prioritizing SNPs for genotyping, as well as for 
more detailed functional analyses. A large survey of many of these programs showed 
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a high level of consistency between programs in identifying 
high-risk/high-priority SNPs for colon cancer research. 7 
However, evolving research supports a functional role for 
intronic SNPs. For example, an intronic SNP associated with 
acute lung injury and asthma regulates promoter activity of 
smMLCK, 8 another in PRRX2 has been shown to interact 
with the conditioning region in KLK2-KLK3, 9 and yet 
another in the GHI gene that is associated with reduced col- 
orectal cancer risk was shown to decrease GHI expression. 10 
Each of these intronic SNPs is predicted to have no to low 
risk of effect in either the in silico FASTSNP or F-SNP 
prediction programs. 

To explore the accuracy of predictive models with SNP 
functionality, identified tagSNPs were correlated with gene 
expression in normal colon tissue. Empiric results were 
then compared with the in silico risk prediction programs, 
FASTSNP and F-SNP. 

Materials and methods 

Tissue samples 

Deidentified normal frozen colon tissues (n = 82) were 
obtained from the Cooperative Human Tissue Network, 
funded by the National Cancer Institute, and stored at -80°C . 
Of the sample population, 54% were male and 46% were 
female. The tissue donors were aged 17-92 (mean 60.48) 
years and were of Caucasian (n = 51), African American 
(n = 23), Asian (n = 1), and unknown (n = 7) origin. 

Reverse transcription and quantitative 
real-time polymerase chain reaction 

Total DNA was isolated from normal colon tissue samples 
using the AllPrep DNA/RNA/Protein Mini Kit (Qiagen, 
Valencia, CA, USA). Total RNA was isolated utilizing Trizol 
(Invitrogen, Grand Island, NY, USA) for homogenization, and 
the RNEasy Mini kit (Qiagen) for isolation using a protocol 
developed by Mauricio Rodriquez-Lanetty (unpublished) 
with minor alterations. Briefly, tissues (about 25 mg) were 
homogenized in 150 |iL Trizol using a Bullet Blender and 
stainless steel beads. The homogenate was placed in a new 
vial with 450 |lL of Trizol. After adding 100 |lL of chloro- 
form, the vials were shaken well, incubated for 2 minutes 
at room temperature, centrifuged, and the supernatant was 
placed in a new vial. Equal parts of 1 00% ethanol were added, 
and the mixture placed in an RNEasy spin column. RNA was 
washed and eluted according to the RNEasy protocol. 

First strand cDNA synthesis was performed using the 
High Capacity RNA-to-cDNA kit (ABI, Carlsbad, CA, USA) 
on 500 ng total RNA, as measured by an RNA 6000 Nano kit 



(Agilent, Santa Clara, CA, USA). Quantitative real-time reverse 
transcription polymerase chain reaction (PCR) reactions were 
performed on the ABI 7900HT Fast Real Time PCR System 
using Taqman primer/probe sets and Taqman Fast Universal 
PCR Master Mix no AmpErase® UNG (ABI). Experiments 
were run as per the manufacturer's protocol in triplicate on 
cDNA diluted 1:10 for 50 PCR cycles, retaining those with 
standard deviations <1 (exclusions: IFNGR2 [1], IL1B [1]). 
Samples were normalized to P-actin, discarding those with 
P-actin Ct (cycle threshold) >30 (IFNGRI [4], IFINGR2 [4], 
IL1B [5], LEPR [1], RPS6KB1 [1], TSC2 [4]). Genes of interest 
Ct >40 or undetermined were set to 40. P-actin was chosen as 
the housekeeping gene because, in normal colon tissue, it has 
been shown that structural housekeeping genes such as P-actin 
have less variation than metabolic housekeeping genes such as 
glyceraldehyde 3-phosphate dehydrogenase. 11 

TagSNP selection and genotyping 

TagSNPs were selected using the following parameters: 
r 2 = 0.8 defined LD blocks using a Caucasian LD map, 
minor allele frequency >0.1, range -1,500 base pairs from 
the initiation codon to +1,500 base pairs from the termina- 
tion codon, and one SNP/LD bin. All markers were geno- 
typed using a multiplexed bead array assay format based on 
GoldenGate chemistry (Illumina, San Diego, CA, USA). 
A genotyping call rate of 99.93% was attained. Blinded 
internal replicates represented 1.6% of the sample set. The 
duplicate concordance rate was 99.996%. 

In silico prediction programs 

Two in silico programs were used. FASTSNP is a web- 
based tool for assessing phenotypic effects of SNPs through 
the use of external web servers and a prediction algorithm. 
FASTSNP uses a ranking system from 0 (no known effect) 
to 5 (very high risk) based on location of the SNP (eg, 5' 
upstream, 3' untranslated region, intronic) and possible 
functional effects such as amino acid changes, alterations 
in splicing sites, and "premature translation termination". 6 
F-SNP also utilizes bioinformatic tools and websites to pre- 
dict the functional effects of SNPs. The process has several 
steps, with each step determining the next. For instance, if a 
mutation is found in the coding region through Ensembl, the 
information is then submitted to an outside bioinformatics 
website, such as PolyPhen, to test for functional effect. 4 

Statistical analysis 

Identified TagSNPs for 34 genes (CYP19A1, IFNG, IFNGR1, 
IFNGR2, IKBKB, IL10, IL15, III 7 A, ILIA, ILIB, IL1RN, IL2, 
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IL23R, IL2RA, IL4, IL6, IL6R, IL8, LEPR, MTOR, NFKB1, 
PDGFB, PDKl, PIK3CA, PRKAG2, PTEN, RPS6KB1, 
RPS6KB2, STAT3, STAT 5B, TGFB1, TNF, TSC2, VEGFA) 
were entered into the FASTSNP website, and predicted risk 
values were noted. Six genes (IFNGR1, IFNGR2, IL1B, 
LEPR, RPS6KB1, and TSC2) were identified as having SNPs 
that were predicted to have a score of either 2-3 or 3-4 (low 
to medium or medium to high risk of effect, respectively). 
From these six genes, tagSNPs with a score of 0-0 (no or 
unknown risk, n=16) or with a score of either 2-3 or 3-4 
(low to high risk, n=8) were chosen for further comparison 
with phenotype data. Results from F-SNP were based on 
transcriptional regulation and marked either "changed'V'not 
changed" or "exist'V'not exist." A functional significance 
score is given, with a score of ^0.5 being considered likely 
to lead to functional changes. 12 The TagSNPs chosen for 
FASTSNP prediction were entered into the F-SNP prediction 
program and compared with both phenotype data and with 
FASTSNP predictions in order to assess similarity between 
prediction programs. 

Statistical analyses were performed using SAS version 9.3 
(SAS Institute, Cary, NC, USA). The level of expression for 
the candidate gene was calibrated to the expression of the 
housekeeping gene to generate change in Ct. Expression lev- 
els were calculated by taking 2 A ACt and the median of those 
values was assessed by genotype. A codominant model was 
initially assumed, but if a dominant or recessive model fitted 
the data better, that model was evaluated and is presented. 
P-values comparing median expression levels across geno- 
types are based on Wilcoxon rank-sum and Kruskal-Wallis 
rank-sum tests. Statistical significance was set alP < 0.05. 
SNP associations were performed among Caucasians and 
African Americans separately, and the directions of the asso- 
ciations are the same for both races for the three leptin recep- 
tor SNPs that were reported as being significant (rs8 1 79 1 83 , 
rs9436301, rs4655537). Race was not associated with gene 
expression. Expression was also not statistically significantly 
different by age or gender. 

Results 

Predicted and actual effects 
in normal colon samples 

The predicted FASTSNP and F-SNP effects and gene expres- 
sion association P-values of the 24 TagSNPs are presented 
in Table 1. Of 16 SNPs predicted to have no/unknown (0-0) 
effect, two {LEPR rs4655537 and rs9436301) were found to 
be significantly associated with gene expression (Table 2). 
The common homozygous LEPR rs4655537 genotype (GG) 



Table I Prediction scores and association with gene expression 



Gene 


SNP 


FASTSNP 

score 


F-SNP 

score 


P-value for SNP 
association 
with expression 


IFNGRI 


rsl 327475 


2-3 


0.176 


0.26 




rs9376267 


0-0 


0.208 


0.90 


IFNGR2 


rs9808753 


3-4 


0.633 


0.35 




rs997697l 


0-0 


0.5 


0.52 


IUB 


rsl 143634 


2-3 


0.330 


0.92 




rsl 143633 


0-0 


0.268 


0.29 


LEPR 


rsl 1 37101 


3-4 


0.291 


0.28 




rs8 1 79 1 83 


3-4 


0.533 


0.048 




rsl 805096 


2-3 


0.5 


0.15 




rsl 2 145690 


0-0 


0.217 


0.83 




rs943630l 


0-0 


0.141 


0.04 




rs6704l67 


0-0 


0.176 


0.87 




rsl 171271 


0-0 


0.242 


0.84 




rs6673324 


0-0 


0.109 


0.78 




rsl 2059300 


0-0 


0.065 


0.83 




rs4655537 


0-0 


0.158 


0.01 




rsl 938484 


0-0 


0.242 


0.28 


RPS6KBI 


rs 1 80523 


3-4 


I 


0.63 




rs807l475 


0-0 


0.208 


0.20 




rs 1 805 1 5 


0-0 


0.276 


0.42 


TSC2 


rsl05l77l 


2-3 


0.568 


0.99 




rs2073636 


0-0 


0.242 


0.74 




rs30259 


0-0 


0.176 


0.12 




rs308763 1 


0-0 


0.050 


0.19 



Abbreviations: SNP, Single Nucleotide Polymorphism; FASTSNP, 
Function Analysis and Selection Tool for Single Nucleotide 
Polymorphism; F-SNP, Functional Single Nucleotide Polymorphism. 



is associated with a 1 .7-fold increase (P = 0.0 1) in expression 
of LEPR compared with the heterozygous or homozy- 
gous variant (GA/AA) genotype. The CC variant LEPR 
rs9436301 genotype is associated with a 1.52-fold increase 
in gene expression (P = 0.04) as compared with the CT/TT 
genotype. 

Of the eight tagSNPs that were predicted to have a low to 
high effect (2-3 or 3^1) in the FASTSNP program, only LEPR 
rs8 1 79 1 83 was significantly associated with gene expression. 
The common homozygous genotype (GG) was associated 
with a 1 .6-fold decrease (P = 0.048) in expression compared 
with the heterozygote and homozygous variant (GC/CC). 

When compared, FASTSNP and F-SNP scores were simi- 
lar, although not entirely consistent (Table 1). For TagSNPs 
that were predicted to have no (0-0) effect in FASTSNP, the 
F-SNP score was below 0.5, the score at which a SNP is 
likely to lead to functional changes. Of the eight SNPs that 
were predicted to have a low to medium (2-3) or medium to 
high (3-4) effect with FASTSNP, five received a functional 
significance score s0.5. The other three ranged in scores 
from 0.176 to 0.330, causing their prediction to match the 
genotype/phenotype results better. While four of the five 
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Table 2 SNPs with significant association with ge 


ne expression 








Gene 


SNP 


N 


Gene expression" 


Kruskal-Wallis 


FASTSNP 


F-SNP 










P-value 


score 


score 


LEPR 


rs8l79l83 














GG 


54 


40.6433 


0.048 


3-4 


0.533 




GCICC 


27 


65.1387 










rs943630l 














TT/TC 


70 


44.3307 


0.043 


0-0 


0.141 




CC 


1 1 


67.5232 










rs46S5S37 














GG 


36 


59.7931 


0.01 1 


0-0 


0.158 




GA/AA 


45 


33.6846 









Note: "Gene expression values are median 2 A ACt X I0 4 . 

Abbreviations: SNP, Single Nucleotide Polymorphism; FASTSNP, Function Analysis and Selection Tool for Single Nucleotide Polymorphism. 



tagSNPs with a functional significance score ^0.5 hovered 
near 0.5 (0.5-0.633), one {RPS6KB1 rsl80523) had a func- 
tional significance score of 1. RPS6KB1 rs 180523 also had 
a FASTSNP score of 3^1, but the expression results showed 
no statistically significant differences in expression across 
genotypes (P = 0.63). 

Discussion 

Differentiating between SNPs that may be deleterious and 
those that are "benign" is critical to risk assessment and the 
design of cancer prevention strategies. 5 With the human 
genome being home to potentially millions of SNPs, labora- 
tory discovery of individual SNPs is a daunting task. For this 
reason, in silico programs have emerged to assist in choos- 
ing functional SNPs. These programs use readily available 
scientific data and bioinformatics to offer predictions on the 
functional effects of SNPs. This study sought to determine 
genotype-phenotype relationships empirically, and found 
that a zero risk of effect in an in silico prediction program 
does not guarantee a lack of effect of certain SNPs in human 
colon samples. 

In an effort to explore this in relation to gene expression, 
82 colon samples were genotyped and phenotyped for the 
24 TagSNPs predicted by FASTSNP to have either no effect 
(0-0) or a low to medium or medium to high effect (2-3 or 
3^1, respectively). Our results showed that two of the 1 6 SNPs 
that were predicted to have no effect had a significant associa- 
tion with gene expression. In the eight SNPs with a predicted 
low to high effect, only one showed a significant association 
with gene expression. 

Not all prediction programs generate similar results. The 
databases and external websites employed by each program 
are different (although there is some overlap), and unique 
algorithms are likely to generate disparate results. Thus, 
FASTSNP results were compared with those of F-SNP 
F-SNP combines accumulated results into a single "functional 



significance score," with a score of >0.5 considered likely to 
lead to functional changes, given that that is the median score 
for known disease-related SNPs. 4 For these data, FASTSNP 
and F-SNP scores corresponded for SNPs predicted to have 
no known effect. However, they did not match with all SNPs 
that were predicted to have a low to high effect. 

There is a chance that the lack of correlation is due to 
the small sample size. Also, the functionality of SNPs is 
not limited to RNA expression, and prediction programs 
are designed to explore other dimensions of functionality, 
such as amino acid changes and alterations in splicing sites. 
This may explain a portion of the high-priority SNPs that 
showed no change in mRNA expression. Further functionality 
experiments would be necessary to explore other mecha- 
nisms of action, such as post-translational modification, 
protein expression, and protein function, specifically with 
the leptin receptor protein. There may also be organ-specific 
differences in gene expression, which may have impacted 
the results shown here. This further necessitates laboratory 
functionality studies and inspection of low-priority SNPs 
in a case-by-case manner. It is also possible that the SNPs 
chosen for analysis are not truly functional SNPs, but exist 
in tight linkage with the causative SNP. For this reason also, 
biochemical studies are necessary to define the mechanistic 
basis of the noted associations. 

There are a few examples of comparison of FASTSNP 
and functional in vitro experiments. However, these only 
focus on the high-priority SNPs. For example, a study in the 
Chinese Han population found two cystathionine gamma- 
lyase SNPs (rs482843 and rsl021737) to be identified by 
FASTSNP as high-priority SNPs, yet which showed no 
significant contribution to the risk of essential hypertension 
in this population. 13 On the other hand one in vitro study 
created a p 1 6INK4A protein (from the CDKN2A gene) based 
on SNPs identified as high-priority by FASTSNP and other 
in silico programs, and found that CDKN2A rs 1 1 5 52822 may 
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lead to a decrease in binding affinity for CDK6, and may be 
involved in the development of malignant melanoma. 14 

In silico programs have been shown to be accurate when 
predicting functional effects with SNPs that rank very high 
on their prediction list, and certainly these higher-risk SNPs 
may be prioritized in laboratory-based research. However, 
it is not likely that they stand alone in the progression of 
complex disease. 15 Thus, SNPs that are ranked as "no risk" 
by in silico programs may actually have an effect on gene 
expression, which may, in turn, lead to an effect on protein 
abundance and subsequent functioning of the enzyme. For 
example, the no to low priority GH1 rs2665802 has been 
associated with both a decrease in human growth hormone 
gene expression and growth hormone secretion. It was noted 
that this SNP may work in conjunction with other SNPs 
not studied, but the contribution of the SNP was found to 
be direct. 10 

Even low to medium effects on enzymatic activity 
may play an important role in the development of disease. 
Therefore, functional analyses of these low risk SNPs are 
necessary to capture fully the genotypic contributions to 
phenotype. This information is critical in determining 
the biological basis of variability, and can potentially 
aid in the design of rational intervention/prevention 
strategies. 
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